Combining Machine Readable Lexical Resources and Bilingual Corpora for Broad Word Sense Disambiguation
نویسندگان
چکیده
This paper describes a new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division. The semantically related sense entries in a bilingual dictionary are arranged in clusters using a heuristic labeling algorithm to provide a more complete and appropriate sense division for WSD. Multiple translations of senses serve as outside information for automatic tagging of bilingual corpora and acquisition of WSD rules. We describe and implement a WSD method using the English-Chinese bilingual version (LecDOCE) of the Longman Dictionary of Contemporary English (LDOCE). For this purpose, we draw on information about topics and topical sets in the Longman Lexicon of Contemporary English (LLOCE) to represent and disambiguate LecDOCE senses. Example sentences and their translations from LecDOCE are employed as training materials for WSD, while further examples from the Brown corpus are used for testing. Quantitative results of disambiguating 12 words are also presented.
منابع مشابه
Building A Chinese WordNet Via Class-Based Translation Model
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of...
متن کاملThe CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance a Bilingual Dictionary: Extended abstract
Bilingual machine-readable dictionaries are knowledge resources useful in many automatic tasks. However, compared to monolingual computational lexicons like WordNet, bilingual dictionaries typically provide a lower amount of structured information such as lexical and semantic relations, and often do not cover the entire range of possible translations for a word of interest. In this paper we pre...
متن کاملDisambiguation of Compound Noun Translations Extracted from Bilingual Comparable Corpora
Bilingual machine readable dictionaries are important and indispensable information resources for cross-language information retrieval, machine translation, and so on. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. We also experim...
متن کاملThe CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance a Bilingual Dictionary
Bilingual machine-readable dictionaries are knowledge resources useful in many automatic tasks. However, compared to monolingual computational lexicons like WordNet, bilingual dictionaries typically provide a lower amount of structured information such as lexical and semantic relations, and often do not cover the entire range of possible translations for a word of interest. In this paper we pre...
متن کاملAutomatic WordNet Mapping Using Word Sense Disambiguation
This paper presents the automatic construction of a Korean WordNet from pre-existing lexical resources. A set of automatic WSD techniques is described for linking Korean words collected from a bilingual MRD to English WordNet synsets. We will show how individual linking provided by each WSD method is then combined to produce a Korean WordNet for nouns.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996